Voice-to-remaining audio (VRA) interactive hearing aid and auxiliary equipment

ABSTRACT

An integrated individual listening device and decoder for receiving an audio signal including a decoder for decoding the audio signal by separating the audio signal into a voice signal and a background signal, a first end-user adjustable amplifier coupled to the voice signal and amplifying the voice signal; a second end-user adjustable amplifier coupled to the background signal and amplifying the background signal; a summing amplifier coupled to outputs of said first and second end-user adjustable amplifiers and outputting a total audio signal, said total signal being coupled to an individual listening device.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of U.S. provisional patentapplication Ser. No. 60/139,243 entitled “Voice-to-Remaining Audio (VRA)Interactive Hearing Aid & Auxiliary Equipment,” filed on Jun. 15, 1999.

This patent application is a reissue application for commonly assignedU.S. Pat. No. 6,985,594, issued from U.S. patent application Ser. No.09/593,149, filed on Jun. 14, 2000.

FIELD OF THE INVENTION

Embodiments of the present invention relate generally to processingaudio signals, and more particularly, to a method and apparatus forprocessing audio signals such that hearing impaired listeners can adjustthe level of voice-to-remaining audio (VRA) to improve their listeningexperience.

BACKGROUND OF THE INVENTION

As one ages and progresses through life, over time due to many factors,such as age, genetics, disease, and environmental effects, one's hearingbecomes compromised. Usually, the deterioration is specific to certainfrequency ranges.

In addition to permanent hearing impairments, one may experiencetemporary hearing impairments due to exposure to particular high soundlevels. For example, after target shooting or attending a rock concertone may have temporary hearing impairments that improve somewhat, butover time may accumulate to a permanent hearing impairment. Even lowersound levels than these but longer lasting may have temporary impacts onone's hearing, such as working in a factory or teaching in a elementaryschool.

Typically, one compensates for hearing loss or impairment by increasingthe volume of the audio. But, this simply increases the volume of allaudible frequencies in the total signal. The resulting increase in totalsignal volume will provide little or no improvement in speechintelligibility, particularly for those whose hearing impairment isfrequency dependent.

While hearing impairment increases generally with age, many hearingimpaired individuals refuse to admit that they are hard of hearing, andtherefore avoid the use of devices that may improve the quality of theirhearing. While many elderly people begin wearing glasses as they age, asignificantly smaller number of these individuals wear hearing aids,despite the significant advances in the reduction of the size of hearingaids. This phenomenon is indicative of the apparent societal stigmaassociated with hearing aids and/or hearing impairments. Consequently,it is desirable to provide a technique for improving the listeningexperience of a hearing impaired listener in a way that avoids theapparent associated societal stigma.

Most audio programming, be it television audio, movie audio, or musiccan be divided into two distinct components: the foreground and thebackground. In general, the foreground sounds are the ones intended tocapture the audiences attention and retain their focus, whereas thebackground sounds are supporting, but not of primary interest to theaudience. One example of this can be seen in television programming fora “sitcom,” in which the main character's voices deliver and develop theplot of the story while sound effects, audience laughter, and music fillthe gaps.

Currently, the listening audience for all types of audio media arerestricted to the mixture decided upon by the audio engineer duringproduction. The audio engineer will mix all other background noisecomponents with the foreground sounds at levels that the audio engineerprefers, or at which the audio engineer understands have some historicalbasis. This mixture is then sent to the end-user as either a single(mono) signal or in some cases as a stereo (left and right) signal,without any means for adjusting the foreground to the background.

The lack of this ability to adjust foreground relative to backgroundsounds is particularly difficult for the hearing impaired. In manycases, programming is difficult to understand (at best) due tobackground audio masking the foreground signals.

There are many new digital audio formats available. Some of these haveattempted to provide capability for the hearing impaired. For example,Dolby Digital, also referred to as AC-3 (or Audio Codec version 3), is acompression technique for digital audio that packs more data into asmaller space. The future of digital audio is in spatial positioning,which is accomplished by providing 5.1 separate audio channels: Center,Left and Right, and Left and Right Surround. The sixth channel, referredto as the 0.1 channel refers to a limited bandwidth low frequencyeffects (LFE) channel that is mostly non-directional due to its lowfrequencies. Since there are 5.1 audio channels to transmit, compressionis necessary to ensure that both video and audio stay within certainbandwidth constraints. These constraints (imposed by the FederalCommunications Commission (FCC)) are more strict for terrestrialtransmission than for digital video disk (DVD)s, currently. There ismore than enough space on a DVD to provide the end-user withuncompressed audio (much more desirable from a listening standpoint).Video data is compressed most commonly through MPEG (moving picturesexperts group) developed techniques, although they also have an audiocompression technique very similar to Dolby's.

The DVD industry has adopted Dolby Digital (DD) as its compressiontechnique of choice. Most DVD's are produced using DD. The ATSC(Advanced Television Standards Committee) has also chosen AC-3 as itsaudio compression scheme for American digital TV. This has spread tomany other countries around the world. This means that productionstudios (movie and television) must encode their audio in DD forbroadcast or recording.

There are many features, in addition to the strict encoding and decodingscheme, that are frequently discussed in conjunction with Dolby Digital.Some of these features are part of DD and some are not. Along with thecompressed bitstream, DD sends information about the bitstream calledmetadata, or “data about the data.” It is basically zero's and onesindicating the existence of options available to the end-user. Three ofthese options are dialnorm (dialog normalization), dynrng (dynamicrange), and bsmod (bit stream mode that controls the main and associatedaudio services). The first two are an integral part of DD already, sincemany decoders handle these variables, giving end-users the ability toadjust them. The third bit of information, bsmod, is described in detailin ATSC document A/54 (not a Dolby publication) but also exists as partof the DD bitstream. The value of bsmod alerts the decoder about thenature of the incoming audio service, including the presence of anyassociated audio service. At this time, no known manufacturers areutilizing this parameter. Multiple language DVD performances arecurrently provided via multiple complete main audio programs on one ofthe eight available audio tracks on the DVD.

The dialnorm parameter is designed to allow the listener to normalizeall audio programs relative to a constant voice level. Between channelsand between program and commercial, overall audio levels fluctuatewildly. In the future, producers will be asked to insert the dialnormparameter which indicates the sound pressure level (SPL)s at which thedialog has been recorded. If this value is set as 80 dB for a programbut 90 dB for a commercial, the television will decode that informationexamine the level the end-user has entered as desirable (say 85 dB) andwill adjust the movie up 5 dB and the commercial down 5 dB. This is atotal volume level adjustment that is based on what the producer entersas the dialnorm bit value.

A section from the AC-3 description (from document A/52) provides thebest description of this technology. “The dynrng values typicallyindicate gain reduction during the loudest signal passages, and gainincrease during the quiet passages. For the listener, it is desirable tobring the loudest sounds down in level towards the dialog level, and thequiet sounds up in level, again towards dialog level. Sounds which areat the same loudness as the normal spoken dialogue will typically nothave their gain changed.”

The dynrng variable provides the end-user with an adjustable parameterthat will control the amount of compression occurring on the totalvolume with respect to the dialog level. This essentially limits thedynamic range of the total audio program about the mean dialog level.This does not, however, provide any way to adjust the dialog levelindependently of the remaining audio level.

One attempt to improve the listening experience of hearing impairedlisteners is provided for in The ATSC, Digital Television Standard(Annex B). Section 6 of Annex B of the ATSC standard describes the mainaudio services and the associated audio services. An AC-3 elementarystream contains the encoded representation of a single audio service.Multiple audio services are provided by multiple elementary streams.Each elementary stream is conveyed by the transport multiplex with aunique PID. There are a number of audio service types which may beindividually coded into each elementary stream. One of the audio servicetypes is called the complete main audio service (CM). The CM type ofmain audio service contains a complete audio program (complete withdialogue, music and effects). The CM service may contain from 1 to 5.1audio channels. The CM service may be further enhanced by means of theother services. Another audio service type is the hearing impairedservice (HI). The HI associated service typically contains only dialoguewhich is intended to be reproduced simultaneously with the CM service.In this case, the HI service is a single audio channel. As statedtherein, this dialogue may be processed for improved intelligibility byhearing impaired listeners. Simultaneous reproduction of both the CM andHI services allows the hearing impaired listener to hear a mix of the CMand HI services in order to emphasize the dialogue while still providingsome music and effects. Besides providing the HI service as a singledialogue channel, the HI service may be provided as a complete programmix containing music, effects, and dialogue with enhancedintelligibility. In this case, the service may be coded using any numberof channels (up to 5.1). While this service may improve the listeningexperience for some hearing impaired individuals, it certainly will notfor those who do not employ the proscribed receiver for fear of beingstigmatized as hearing impaired. Finally, any processing of the dialoguefor hearing impaired individuals prevents the use of this channel increating an audio program for non-hearing individuals. Moreover, therelationship between the HI service and the CM service set forth inAnnex B remains undefined with respect to the relative signal levels ofeach used to create a channel for the hearing impaired.

Other techniques have been employed to attempt to improve theintelligibility of audio. For example, U.S. Pat. No. 4,024,344 disclosesa method of creating a “center channel” for dialogue in cinema sound.This technique disclosed therein correlates left and right stereophonicchannels and adjusts the gain on either the combined and/or the separateleft or right channel depending on the degree of correlation between theleft and right channel. The assumption being that the strong correlationbetween the left and right channels indicates the presence of dialogue.The center channel, which is the filtered summation of the left andright channels, is amplified or attenuated depending on the degree ofcorrelation between the left and right channels. The problem with thisapproach is that it does not discriminate between meaningful dialogueand simple correlated sound, nor does it address unwanted voiceinformation within the voice band. Therefore, it cannot improve theintelligibility of all audio for all hearing impaired individuals.

In general, the previously cited inventions of Dolby and others have allattempted to modify some content of the audio signal through varioussignal processing hardware or algorithms, but those methods do notsatisfy the individual needs or preferences of different listeners. Insum, all of these techniques provide a less than optimum listeningexperience for hearing impaired individuals as well as non-hearingimpaired individuals.

Finally, miniaturized electronics and high quality digital audio hasbrought about a revolution in the digital hearing aid technology. Inaddition, the latest standards of digital audio transmission andrecordings including DVD (in all formats), digital television, Internetradio, and digit radio, are incorporating sophisticated compressionmethods that allow an end-user unprecedented control over audioprogramming. The combination of these two technologies has presentedimproved methods for providing hearing impaired end-users with theability to enjoy digital audio programming. This combination, however,fails to address all of the needs and concerns of different hearingimpaired end-users.

The present invention is therefore directed to the problem of developinga system and method for processing audio signals that optimizes thelistening experience for hearing impaired listeners, as well asnon-hearing impaired listeners, individually or collectively.

SUMMARY OF THE INVENTION

An integrated individual listening device and decoder for receiving anaudio signal including a decoder for decoding the audio signal byseparating the audio signal into a voice signal and a background signal,a first end-user adjustable amplifier coupled to the voice signal andamplifying the voice signal, a second end-user adjustable amplifiercoupled to the background signal and amplifying the background signal, asumming amplifier coupled to outputs of said first and second end-useradjustable amplifiers and outputting a total audio signal, said totalsignal being coupled to an individual listening device.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a general approach according to the present inventionfor separating relevant voice information from general background audioin a recorded or broadcast program.

FIG. 2 illustrates and exemplary embodiment according to the presentinvention for receiving and playing back the encoded program signals.

FIG. 3 illustrates and exemplary embodiment of a conventional individuallistening device such as a hearing aid.

FIG. 4 is a block diagram illustrating a voice-to-remaining audio (VRA)system for simultaneous multiple end-users.

FIG. 5 is a block diagram illustrating a decoder that sends wirelesstransmission to individual listening devices according to an embodimentof the present invention.

FIG. 6 is an illustration of ambient sound arriving at both the hearingaid's microphone and the end-user's ear.

FIG. 7 is an illustration of an earplug used with the hearing aid shownin FIG. 6.

FIG. 8 is a block diagram of signal paths reaching a hearing impairedend-user through a decoder enabled hearing aid according to anembodiment of the present invention.

FIG. 9 is a block diagram of signal paths reaching a hearing impairedend-user incorporating an adaptive noise canceling algorithm.

FIG. 10 is a block diagram of signal paths reaching a hearing impairedend-user through a decoder according to an alternative embodiment of thepresent invention.

FIG. 11 illustrates another embodiment of the present invention.

FIG. 12 illustrates an alternative embodiment of the present invention.

DETAILED DESCRIPTION

Embodiments of the present invention are directed to an integratedindividual listening device and decoder. An example of one such decoderis a Dolby Digital (DD) decoder. As stated above, Dolby Digital is anaudio compression standard that has gained popularity for use interrestrial broadcast and recording media. Although the discussionherein uses a DD decoder, other types of decoders may be used withoutdeparting from the spirit and scope of the present invention. Moreover,other digital audio standards besides Dolby Digital are not precluded.This embodiment allows a hearing impaired end-user in a listeningenvironment with other listeners, to take advantage of the “HearingImpaired Associated Audio Service” provided by DD without affecting thelistening enjoyment of the other listeners. As used herein, the term“end-user” refers to a consumer, listener or listeners of a broadcast orsound recording or a person or persons receiving an audio signal on anaudio media that is distributed by recording or broadcast. In addition,the term “individual listening device” refers to hearing aids, headsets,assistive listening devices, cochlear implants or other devices thatassist the end-user's listening ability. Further, the term “preferredaudio” refers to the preferred signal, voice component, voiceinformation, or primary voice component of an audio signal and the term“remaining audio” refers to the background, musical or non-voicecomponent of an audio signal.

Other embodiments of the present invention relate to a decoder thatsends wireless transmissions directly to a individual listening devicesuch as a hearing aid or cochlear implant. Used in conjunction with the“Hearing Impaired Associated Audio Service” provided by DD whichprovides separate dialog along with a main program, the decoder providesthe hearing impaired end-user with adjustment capability for improveintelligibility with other listeners in the same listening environmentwhile the other listeners enjoy the unaffected main program.

Further embodiments of the present invention relate to an interceptionbox which services the communications market when broadcast companiestransition from analog transmission to digital transmission. Theintercept box allows the end-user to take advantage of the hearingimpaired mode (HI) without having a fully functional main/associatedaudio service decoder. The intercept box decodes transmitted digitalinformation and allows the end-user to adjust hearing impairedparameters with analog style controls This analog signal is also feddirectly to an analog play device such as a television. According to thepresent invention, the intercept box can be used with individuallistening devices such as hearing aids or it can allow digital servicesto be made available to the analog end-user during the transitionperiod.

Significance of Ratio of Preferred Audio to Remaining Audio

The present invention begins with the realization that the listeningpreferential range of a ratio of a preferred audio signal relative toany remaining audio is rather large, and certainly larger than everexpected. This significant discovery is the result of a test of a smallsample of the population regarding their preferences of the ratio of thepreferred audio signal level to a signal level of all remaining audio.

Specific Adjustment of Desired Range for Hearing Impaired or NormalListeners

Very directed research has been conducted in the area of understandinghow normal and hearing impaired end-users perceive the ratio betweendialog and remaining audio for different types of audio programming. Ithas been found that the population varies widely in the range ofadjustment desired between voice and remaining audio.

Two experiments have been conducted on a random sample of the populationincluding elementary school children, middle school children,middle-aged citizens and senior citizens. A total of 71 people weretested. The test consisted of asking the end-user to adjust the level ofvoice and the level of remaining audio for a football game (where theremaining audio was the crowd noise) and a popular song (where theremaining audio was the music). A metric called the VRA (voice toremaining audio) ratio was formed by dividing the linear value of thevolume of the dialog or voice by the linear value of the volume of theremaining audio for each selection.

Several things were made clear as a result of this testing. First, notwo people prefer the identical ratio for voice and remaining audio forboth the sports and music media. This is very important since thepopulation has relied upon producers to provide a VRA (which cannot beadjusted by the consumer) that will appeal to everyone. This can clearlynot occur, given the results of these tests. Second, while the VRA istypically higher for those with hearing impairments (to improveintelligibility) those people with normal hearing also prefer differentratios than are currently provided by the producers.

It is also important to highlight the fact that any device that providesadjustment of the VRA must provide at least as much adjustmentcapability as is inferred from these tests in order for it to satisfy asignificant segment of the population. Since the video and home theatermedium supplies a variety of programming, we should consider that theratio should extend from at least the lowest measured ratio for anymedia (music or sports) to the highest ratio from music or sports. Thiswould be 0.1 to 20.17, or a range in decibels of 46 dB. It should alsobe noted that this is merely a sampling of the population and that theadjustment capability should theoretically be infinite since it is verylikely that one person may prefer no crowd noise when viewing a sportsbroadcast and that another person would prefer no announcement. Notethat this type of study and the specific desire for widely varying VRAratios has not been reported or discussed in the literature or priorart.

In this test, an older group of men was selected and asked to do anadjustment (which test was later performed on a group of students)between a fixed background noise and the voice of an announcer, in whichonly the latter could be varied and the former was set at 6.00. Theresults with the older group were as follows:

TABLE I Individual Setting 1 7.50 2 4.50 3 4.00 4 7.50 5 3.00 6 7.00 76.50 8 7.75 9 5.50 10 7.00 11 5.00

To further illustrate the fact that people of all ages have differenthearing needs and preferences, a group of 21 college students wasselected to listen to a mixture of voice and background and to select,by making one adjustment to the voice level, the ratio of the voice tothe background. The background noise, in this case crowd noise at afootball game, was fixed at a setting of six (6.00) and the studentswere allowed to adjust the volume of the announcers' play by play voicewhich had been recorded separately and was pure voice or mostly purevoice. In other words, the students were selected to do the same testthe group of older men did. Students were selected so as to minimizehearing infirmities caused by age. The students were all in their lateteens or early twenties. The results were as follows:

TABLE II Student Setting of Voice 1 4.75 2 3.75 3 4.25 4 4.50 5 5.20 65.75 7 4.25 8 6.70 9 3.25 10 6.00 11 5.00 12 5.25 13 3.00 14 4.25 153.25 16 3.00 17 6.00 18 2.00 19 4.00 20 5.50 21 6.00

The ages of the older group (as seen in Table I) ranged from 36 to 59with the preponderance of the individuals being in the 40 or 50 year oldgroup. As is indicated by the test results, the average setting tendedto be reasonably high indicating some loss of hearing across the hoard.The range again varied from 3.00 to 7.75, a spread of 4.75 whichconfirmed the findings of the range of variance in people's preferredlistening ratio of voice to background or any preferred signal toremaining audio (PSRA). The overall span for the volume setting for bothgroups of subjects ranged from 2.0 to 7.75. These levels represent theactual values on the volume adjustment mechanism used to perform thisexperiment. They provide an indication of the range of signal to noisevalues (when compared to the “noise” level 6.0) that may be desirablefrom different end-users.

To gain a better understanding of how this relates to relative loudnessvariations chosen by different end-users, consider that the non-linearvolumen control variation from 2.0 to 7.75 represents an increase of 20dB or ten (10) times. Thus, for even this small sampling of thepopulation and single type of audio programming it was found thatdifferent listeners do prefer quite drastically different levels of“preferred signal” with respect to “remaining audio.” This preferencecuts across age groups showing that it is consistent with individualpreference and basic hearing abilities, which was heretofore totallyunexpected.

As the test results show, the range that students (as seen in Table II)without hearing infirmities caused by age selected varied considerablyfrom a low setting of 2.00 to a high of 6.70, a spread of 4.70 or almostone half of the total range of from 1 to 10. The test is illustrative ofhow the “one size fits all” mentality of most recorded and broadcastaudio signals falls far short of giving the individual listener theability to adjust the mix to suit his or her own preferences and hearingneeds. Again, the students had a wide spread in their settings as didthe older group demonstrating the individual differences in preferencesand hearing needs. One result of this test is that hearing preferencesis widely disparate.

Further testing has confirmed this result over a larger sample group.Moreover, the results vary depending upon the type of audio. Forexample, when the audio source was music, the ratio of voice toremaining audio varied from approximately zero to about 10, whereas whenthe audio source was sports programming, the same ratio varied betweenapproximately zero and about 20. In addition, the standard deviationincreased by a factor of almost three, while the mean increased by morethan twice that of music.

The end result of the above testing is that if one selects a preferredaudio to remaining audio ratio and fixes that forever, one has mostlikely created an audio program that is less than desirable for asignificant fraction of the population. And, as stated above, theoptimum ratio may be both a short-term and long-term time varyingfunction. Consequently, complete control over this preferred audio toremaining audio ratio is desirable to satisfy the listening needs of“normal” or non-hearing impaired listeners. Moreover, providing theend-user with the ultimate control over this ratio allows the end-userto optimize his or her listening experience.

The end-user's independent adjustment of the preferred audio signal andthe remaining audio signal will be the apparent manifestation of oneaspect of the present invention. To illustrate the details of thepresent invention, consider the application where the preferred audiosignal is the relevant voice information.

Creation of the Preferred Audio Signal and the Remaining Audio Signal

FIG. 1 illustrates a general approach to separating relevant voiceinformation from general background audio in a recorded or broadcastprogram. There will first need to be a determination made by theprogramming director as to the definition of relevant voice. An actor,group of actors, or commentators must be identified as the relevantspeakers.

Once the relevant speakers are identified, their voices will be pickedup by the voice microphone 301. The voice microphone 1 will need to beeither a close talking microphone (in the case of commentators) or ahighly directional shot gun microphone used in sound recording. Inaddition to being highly directional, these microphones 301 will need tobe voice-band limited, preferably from 200-5000 Hz. The combination ofdirectionality and band pass filtering minimize the background noiseacoustically coupled to the relevant voice information upon recording.In the case of certain types of programming, the need to preventacoustic coupling can be avoided by recording relevant voice of dialogueoff-line and dubbing the dialogue where appropriate with the videoportion of the program. The background microphones 302 should be fairlybroadband to provide the full audio quality of background information,such as music.

A camera 303 will be used to provide the video portion of the program.The audio signals (voice and relevant voice) will be encoded with thevideo signal at the encoder 304. In general, the audio signal is usuallyseparated from the video signal by simply modulating it with a differentcarrier frequency. Since most broadcasts are now in stereo, one way toencode the relevant voice information with the background is tomultiplex the relevant voice information on the separate stereo channelsin much the same way left front and right front channels are added totwo channel stereo to produce a quadraphonic disc recording. Althoughthis would create the need for additional broadcast bandwidth, forrecorded media this would not present a problem, as long as the audiocircuitry in the video disc or tape player is designed to demodulate therelevant voice information.

Once the signals are encoded, by whatever means deemed appropriate, theencoded signals are sent out for broadcast by broadcast system 305 overantenna 313, or recorded on to tape or disc by recording system 306. Incase of recorded audio video information, the background and voiceinformation could be simply placed on separate recording tracks.

Receiving and Demodulating the Preferred Audio Signal and the RemainingAudio

FIG. 2 illustrates an exemplary embodiment for receiving and playingback the encoded program signals. A receiver system 307 demodulates themain carrier frequency from the encoded audio/video signals, in the caseof broadcast information. In the case of recorded media 314, the headsfrom a VCR or the laser reader from a CD player 308 would produce theencoded audio/video signals.

In either case, these signals would be sent to a decoding system 309.The decoder 309 would separate the signals into video, voice audio, andbackground audio using standard decoding techniques such as envelopedetection in combination with frequency or time division demodulation.The background audio signal is sent to a separate variable gainamplifier 310, that the listener can adjust to his or her preference.The voice signal is sent to a variable gain amplifier 311, that can beadjusted by the listener to his or her particular needs, as discussedabove.

The two adjusted signals are summed by a unity gain summing amplifier132 to produce the final audio output. Alternatively, the two adjustedsignals are summed by unity gain summing amplifier 312 and furtheradjusted by variable gain amplifier 315 to produce the final audiooutput. In this manner the listener can adjust relevant voice tobackground levels to optimize the audio program to his or her uniquelistening requirements at the time of playing the audio program. As eachtime the same listener plays the same audio, the ratio setting may needto change due to changes in the listener's hearing, the setting remainsinfinitely adjustable to accommodate this flexibility.

Configuration of a Typical Individual Listening Device

FIG. 3 illustrates an exemplary embodiment of a convention individuallistening device such as a hearing aid 10. Hearing aid 10 includes amicrophone 11, a preamplifier 12, a variable amplifier 13, a poweramplifier 14 and an actuator 15. Microphone 11 is typically positionedin hearing aid 10 such that it faces outward to detect ambientenvironmental sounds in close proximity to the end-user's ear.Microphone 11 receives the ambient environmental sounds as an acousticpressure and coverts the acoustic pressure into an electrical signal.Microphone 11 is coupled to preamplifier 12 which receives theelectrical signal. The electrical signal is processed by preamplifier 12and produces a higher amplitude electrical signal. This higher amplitudeelectrical signal is forwarded to an end-user controlled variableamplifier. End-user controlled variable amplifier is connected to a dialon the outside of the hearing aid. Thus, the end-user has the ability tocontrol the volume of the microphone signal (which is the total of allambient sound). The output of the end-user controlled variable amplifier13 is sent to power amplifier 14 where the electrical signal is providedwith power in order to driver actuator/speaker 15. Actuator/speaker 15is positioned inside the ear canal of the end-user. Actuator/speaker 15converts the electrical signal output from power amplifier 14 into anacoustic signal that is an amplified version of the microphone signalrepresenting the ambient noise. Acoustic feedback from the actuator tothe microphone 11 is avoided by placing the actuator/speaker 15 insidethe ear canal and the microphone 11 outside the ear canal.

Although the components of a hearing aid have been illustrated above,other individual listening devices as discussed above, can be used withthe present invention.

Individual Listening Device and Decoder

In a room listening environment, there may be a combination of listenerswith varying degrees of hearing impairments as well as listeners withnormal listening. A hearing aid or other listening device as describedabove, can be equipped with a decoder that receives a digital signalfrom a programming source and separately decodes the signal, providingthe end-user access to the voice, for example, the hearing impairedassociated service, without affecting the listening environment of otherlisteners.

As stated above, preferred ratio of voice to remaining audio differssignificantly for different people, especially hearing impaired people,and differs for different types of programming (sports versus music,etc.). FIG. 4 is a block diagram illustrating a VRA system forsimultaneous multiple end-users according to an embodiment of thepresent invention. The system includes a bitstream source 220, a systemdecoder 221, a repeater 222 and a plurality of personal VRA decoders 223that are integrated with or connected to individual listening devices224. Typically, a digital source (DVD, digital television broadcast,etc.) provides a digital information signal containing compresseddigital and video information. For example, Dolby Digital provides adigital information signal having an audio program such as the music andeffect (ME) signal and a hearing impaired (HI) signal which is part ofthe Dolby Digital associated services. According to one embodiment ofthe present invention, digital information signal includes a separatevoice component signal (e.g., HI signal) and remaining audio componentsignal (e.g., ME or CE signal) simultaneously transmitted as a singlebitstream to system decoder 221.

According to one embodiment of the present invention, the bitstream frombitstream source 220 is also supplied to repeater 222. Repeater 222retransmits the bitstream to a plurality of personal VRA decoders 223.Each personal VRA decoder 223 includes a demodulator 266 and a decoder267 for decoding the bitstream and variable amplifiers 225 and 226 foradjusting the voice component signal and the remaining audio signalcomponent, respectively. The adjusted signal components are downmixed bysummer 227 and may be further adjusted by variable amplifier 281. Theadjusted signal is then sent to individual listening devices 224.According to one embodiment of the present invention, the personal VRAdecoder is interfaced with the individual listening device and forms oneunit which is denoted as 250. Alternatively, personal VRA decoder 223and individual listening device 224 may be separate devices andcommunicate in a wired or wireless manner. Individual listening device224 may be a hearing aid having the components shown in FIG. 3. As such,the output of personal VRA decoder 223 is feed to end-user controlledamplifier 13 for further adjustment by the end-user. Although threepersonal VRA decoders and associated individual listening devices areshown, more personal VRA decoders and associated individual listeningdevices can be used without departing from the spirit and scope of thepresent invention.

For 5.1 channel programming, voice is primarily placed on the centerchannel while the remaining audio resides on left, right, left surround,and right surround. For end-users with individual listening devices,spatial positioning of the sound is of little concern since most havesevere difficulty with speech intelligibility. By allowing the end-userto adjust the level of the center channel with respect to the other 4.1channels, an improvement in speech intelligibility can be provided.These 5.1 channels are then downmixed to 2 channels, with the volumeadjustment of the center channel allowing the improvement in speechintelligibility without relying on the hearing impaired mode mentionedabove. This aspect of the present invention has an advantage over thefully functional AC3-type, in that an end-user can obtain limited VRAadjustment without the need of a separate dialog channel such as thehearing impaired mode.

FIG. 5 illustrates a decoder that sends wireless transmission directlyto an individual listening device according to an embodiment of thepresent invention. As described above, digital bitstream source 220provides the digital bitstream, as before, to the system decoder 221. Ifthere is no metadata useful to the hearing impaired listener (i.e.,absence of the HI mode) there is no need to transmit the entire digitalbitstream, simply the audio signals. Note that this is a small deviationfrom the concept of having a digital decoder in the hearing aid itself,but is also meant to provide the same service to the hearing impairedindividual. At system reproduction 230, the 5.1 audio channels areseparated into center (containing mostly dialog—depending on productionpractices) and the rest containing mostly music and effects that mightreduce intelligibility. The 5.1 audio signals are also feed totransceiver 260. Transceiver 260 receives and retransmits the signals toa plurality of VRA receiving devices 270. VRA receiving devices 270include circuitry such as demodulators for removing the carrier signalof the transmitted signal. The carrier signal is a signal used totransport or “carry” the information of the output signal. Thedemodulated signal creates left, right, left surround, right surround,and sub (remaining audio) and center (preferred) channel signals. Thepreferred channel signal is adjusted using variable amplifier 225 whilethe remaining audio signal (the combination of the left, right, leftsurround, right surround and subwoofer) is adjusted using variableamplifier 226. The output from each of these variable amplifies is feedto summer 227 and the output from summer 227 may be adjusted usingvariable amplifier 281. This added and adjusted electrical signal issupplied to end-user controlled amplifier 13 and later sent to poweramplifier 14. The amplified electrical signal is then converted into anamplified acoustical signal presented to the end-user. According to theembodiment described above, multiple end-users can simultaneouslyreceived the output signal for VRA adjustments.

FIGS. 6-7 describe several related features used in association with thepresent invention. FIG. 6 illustrates ambient sound (which contains thesame digital audio programming) arriving at both the hearing aid'smicrophone 11 and the end-user' s ear. The ambient sound received by themicrophone will not be synchronized perfectly with the sound arrivingvia the personal VRA decoder 223 attached to the hearing aid. The reasonfor this is that the two transmission paths will have features that aresignificantly different. The personal VRA decoder provides a signal thathas traveled a purely electronic path, at the speed of light, with noadded acoustical features. The ambient sound, however, travels a path tothe end-user from the sound source at the speed of sound and alsocontain reverberation artifacts defined by the acoustics of theenvironment where the end-user is located. If the end-user has at leastsome unassisted hearing capability, turning the ambient microphone ofthe hearing aid off, will not completely remedy the problem. The portionof the ambient sound that the end-user can hear will interfere with theprogramming delivered by the personal audio decoder.

One solution contemplated by the present invention is to provide theend-user with the ability to block the ambient sound while deliveringthe signal from the VRA personal decoder. This is accomplished by usingan earplug as shown in FIG. 7.

While this method will work up to the limits of the earplug ambientnoise rejection capability, it has a notable drawback. For someone toenjoy a program with another person, it will likely be necessary toeasily communicate while the program is ongoing. The earplug will notonly block the primary audio source (which interferes with the decodedaudio entering the hearing aid), but also blocks any other ambient noiseindiscriminately. In order to selectively block the ambient noisegenerated from the primary audio reproduction system without affectingthe other (desirable) ambient sounds, more sophisticated methods arerequired. Note that similar comments can be made concerning theacceptability of using headset decoders. The headset earcups providesome level of attenuation of ambient noise but interfere withcommunication. If this is not important to a hearing impaired end-user,this approach may be acceptable.

What is needed is a way to avoid the latency problems associated withairborne transmission of digital audio programming while allowing thehearing impaired listener to interact with other viewers in the sameroom. FIG. 8 shows a block diagram of the signal paths reaching thehearing impaired end-user through the digital decoder enabled hearingaid. The pure (decoded) digital audio “S” goes directly to the hearingaid “HA” and can be modified by an end-user adjustable amplifier “w₂”.This digital audio signal also travels through the primary deliverysystem and room acoustics (G₁) before arriving at the hearing aidtransducer. In addition to this signal, “d” exists and represents thedesired ambient sounds such as friends talking. This total signalreaching the microphone is also end-user adjustable by the gain(possibly frequency dependent) “w₁”. Clearly the first problem arises byrealizing that the signal s modified by G, interferes with the puredigital audio signal coming from the hearing aid decoder; and thedesired room audio is delivered through the same signal path. A secondproblem exists when the physical path through the hearing aid isincluded, and it is assumed that the end-user has some ability to hearaudio through that path (represented by “G”). What actually arrives atthe ear is a combination of the room audio amplified by w₁, the decodersignal amplified by w₂, and the room audio suppressed by “G”. What isdesired from the entire system is a simple end-user adjustable mixbetween the hearing impaired modified decoder output and the desiredsignal existing in the room. Since there is a separate measurement ofthe decoder signal being transmitted to the end-user, this end result ispossible by using adaptive feedforward control.

FIG. 9 illustrates a reconstructed block diagram incorporating anadaptive filter (labeled “AF”). There is one important assumption thatunderlies the method for adaptive filtering presented in thisembodiment: the transmission path through “G” in FIG. 8 is essentiallynegligible. In physical terms this means that the passive noise controlperformance of the hearing aid itself is sufficient enough to reject theambient noise arriving at the end-user's ear. (Note also that G includesthe amount of hearing impairment that the individual has; if itsufficiently high, this sound path will also be negligible). If this isnot the case, measures should be taken to add additional passive controlto the hearing aid itself so the physical path (not the electronic path)from the environment to the end-user's eardrum has a very high insertionloss. The dotted line in FIG. 9 represents the hearing aid itself. Thereare audio inputs: the hearing aid microphone picking up all ambientnoise (including the audio programming from the primary playback devicespeakers that has not been altered by the hearing impaired modesdiscussed earlier) and the digital audio signal that has been decodedand adjusted for optimal listening for a hearing impaired individual. Asmentioned earlier, the difficulty with the hearing aid microphone isthat it picks up both the desired ambient sounds (conversation) and thelatent audio program. This audio program signal will interfere with thehearing impaired audio program (decoded separately). Simply reducing thevolume level of the hearing aid microphone will remove the desiredaudio. The solution as shown in FIG. 9 is to place an adaptive noisecanceling algorithm on the microphone signal, using the decoder signalas the reference. Since adaptive filters will only attempt to cancelsignals for which they have a coherent reference signal, the ambientconversation will remain unaffected. Therefore the output of theadaptive filter can be amplified separately via w₁, as the desiredambient signal and the decoded audio can be amplified separately via w₂.The inherent difficulty with this method is the bandwidth of the audioprogram that requires canceling may exceed the capabilities of theadaptive filter.

One other possibility is available that combines adaptive feedforwardcontrol with fixed gain feedforward control. This option, illustrated inFIG. 10, is more general in that it does not require that the acousticpath through the hearing aid is negligible. This path is removed fromthe signal hitting the ear by taking advantage of the fact that it ispossible to determine the frequency response (transmission loss) of thehearing aid itself, and to use that estimate to eliminate thecontribution to the overall pressure hitting the ear. FIG. 10illustrates a combination of the entire hearing aid plant and thecontrol mechanism. The plant components are described first. The decodersignal “S” is sent to the hearing aid decoder (as discussed earlier) forprocessing of the hearing impaired or center channel for improvedintelligibility (processing not shown). The same signal is alsodelivered to the primary listening environment and through thoseacoustics, all represented by G₁. Also in the listening environment areaudio signals that are desired such as conversation, represented by thesignal “d”. The combination of these two signals (G₁s+d) is received bythe hearing aid microphone at the surface of the listener's ear. Thissame acoustic signal travels through the physical components of thehearing aid itself, represented by G₂. If the hearing aid has effectivepassive control, this transfer function can be quite small, as assumedearlier. If not, the acoustic or vibratory transmission path can becomesignificant. This signal enters the ear canal behind the hearing aid andfinally travels through any hearing impairment that the end-user mayhave (represented by G₃) to the auditory nerve. Also traveling throughthe hearing aid is the electronic version of the ambient noise(amplified by w₁) combined with the (already adjusted) hearing impaireddecoder signal (amplified by w₂). The end-user adjusted combination ofthese two signals represents the mixture between ambient noise and thepure decoder signal that has already been modified by the same end-userto provide improved intelligibility. To understand the effects of thetwo control mechanisms, consider that the adaptive filter (AF) and theplant estimate G₂ (with a hat on top) are both zero (i.e. no control isin place). The resulting output arriving at the end-users ear becomesG₃G₂d+G₃G₂G₁S+G₃Hw₂S+G₃Hw₁d+G₃Hw₁G₁S

Ideally, the hearing aid (H) will invert the hearing impairment, G₃.Therefore the last three terms where both G₃ and H appear, will have,those coefficients to be approximately one. The resulting equation isthenw₂S+w₁d+G₃G₂d+G₃G₂G₁S+w₁G₁SThis does not provide the sound quality needed. While the desired anddecoder signals do have level adjustment capability, the last threeterms will deliver significant levels of distortion and latency boththrough the electrical and physical signal paths. The desired result isa combination of the pure decoder signal and the desired ambient audiosignal where the end-user can control the relative mix between the twowith no other signals in the output. The variables “S” and “d+G₁S” areavailable for direct measurement and the values of H, w₁, and w₂ arecontrollable by the end-user. This combination of variable permits theadjustment capability desired. If the adaptive filter and the plantestimate (G₂ hat) are now included in the equation for the output to theend end-user's nerve, it becomes:w₁d+w₂G₁S+w_(AFS)+G₃G₂(d+G₁S)−G₃(G₂hat)(d+G₁S)

Now, if the adaptive filter converges to the optimal solution, it willbe identical to G₁ so that the third and fourth terms in the aboveequation cancel. And if the estimate of G₂ approaches G₂ due to a goodsystem identification, the last two terms in the previous equation willalso cancel. This leaves only the decoder signal “S” end-user modifiedby w₂ and the desired ambient sound “d” end-user modified by w₁, thedesired result. The limits of the performance of this method depend onthe performance of the adaptive filter and on the accuracy of the systemidentification from the outside of the hearing aid to the inside of thehearing aid while the end-user has it comfortably in position. Thesystem identification procedure itself can be carried out in a number ofways, including a least mean squares fit.

Interception Box

FIG. 11 illustrates another embodiment according to the presentinvention. FIG. 11 shows the features of a VRA set top terminal used forsimultaneously transmitting a VRA adjustable signal to multipleend-users.

VRA set top terminal 60 includes a decoder 61 for decoding a digitalbitstream supplied by a digital source such as a digital TV, DVD, etc.Decoder 61 decodes the digital bitstream and outputs digital signalswhich have a preferred audio component (PA) and a remaining audioportion (RA). The digital signals are feed into a digital-to-analog(D/A) converters 62 and 69 which converts the digital signals intoanalog signals. The analog signals from D/A converter 62 are feed totransmitter 63 to be transmitted to receivers such as receivers 270shown in FIG. 5. Thus, multiple end-users with individual listeningdevices can adjust the voice-to-remaining audio for each of theirindividual devices. The output from D/A converter 69 is sent to aplayback device such as analog television 290.

FIG. 12 illustrates an alternative embodiment of the present invention.Like in FIG. 11, a bitstream is received by decoder 61 of VRAset-top-terminal 60. Decoder outputs digital signals which are sent toD/A converter 62. The output of D/A converter 62 are analog signals sentto transmitter 63 for transmission of these signals to receivers 270.D/A converter 62 also feeds its output analog signals to variableamplifiers 225 and 226 for end-user adjustments before being downmixedby summer 227. This output signal is feed to analog television 290 in asimilar manner as discussed above with respect to FIG. 11 but alreadyhaving been VRA adjusted. According to this embodiment of the presentinvention, not only will hearing impaired end-users employing receivers270 enjoy VRA adjustment capability, but end-users listening to analogtelevision will have the same capability.

While many changes and modifications can be made to the invention withinthe scope of the appended claims, such changes and modifications arewithin the scope of the claims and covered thereby.

1. A set-top-terminal for providing voice-to-remaining audio capabilitycomprising: a decoder for decoding a bitstream and producing as itsoutput, a digital preferred audio signal and a said digital remainingaudio signal; a digital to analog (D/A) converter coupled to saiddecoder, said D/A converter converting said digital preferred audiosignal and a digital remaining audio signal into an analog preferredaudio signaI and an analog remaining audio signal; a transmitter coupledto said D/A converter and transmitting said analog preferred audiosignal and said analog remaining audio signal; a first end-useradjustable amplifier coupled to said analog preferred voice signal andamplifying said analog preferred voice signal; a second end-useradjustable amplifier coupled to said analog remaining audio signal andamplifying said analog remaining audio signal; and a summer coupled tooutputs of said first and second end-user adjustable amplifiers andoutputting a total audio signal.
 2. The set-top-terminal of claim 1,wherein an output of the summer outputting said total audio signal iscoupled to an analog receiving device.
 3. A method for processing adigital bitstream from a set-top terminal, comprising: decoding abitstream to produce a digital preferred audio signal and a digitalremaining audio signal; converting said digital preferred audio signaland said digital remaining audio signal into an analog preferred audiosignal and an analog remaining audio signal; transmitting said analogpreferred audio signal to a first end-user adjustable amplifier coupledto receive said analog preferred audio signal and said analog remainingaudio signal to a second end-user adjustable amplifier coupled toreceive said analog remaining audio signal; amplifying said analogpreferred voice signal with said first end-user adjustable amplifier;amplifying said analog remaining audio signal with said second end-useradjustable amplifier; and summing output from said first and secondend-user adjustable amplifiers and outputting a total audio signal to anindividual listening device.
 4. The method of claim 3, whereinoutputting said total audio signal includes outputting said total audiosignal to an analog receiving device.
 5. The method of claim 3, furthercomprising: employing a microphone incorporated into a listening deviceto detect an ambient environmental sound; and further processing thedigital bitstream based on detected ambient environmental sound.